John D Noble Project 3: ReCell October 2021

Business Problem

Context

Buying and selling used smartphones used to be something that happened on a handful of online marketplace sites. But the used and refurbished phone market has grown considerably over the past decade, and a new IDC (International Data Corporation) forecast predicts that the used phone market would be worth $52.7bn by 2023 with a compound annual growth rate (CAGR) of 13.6% from 2018 to 2023. This growth can be attributed to an uptick in demand for used smartphones that offer considerable savings compared with new models.

Refurbished and used devices continue to provide cost-effective alternatives to both consumers and businesses that are looking to save money when purchasing a smartphone. There are plenty of other benefits associated with the used smartphone market. Used and refurbished devices can be sold with warranties and can also be insured with proof of purchase. Third-party vendors/platforms, such as Verizon, Amazon, etc., provide attractive offers to customers for refurbished smartphones. Maximizing the longevity of mobile phones through second-hand trade also reduces their environmental impact and helps in recycling and reducing waste. The impact of the COVID-19 outbreak may further boost the cheaper refurbished smartphone segment, as consumers cut back on discretionary spending and buy phones only for immediate needs.

IDC (International Data Corporation) forecast predicts that the used phone market would be worth $52.7bn by 2023 with a compound annual growth rate (CAGR) of 13.6% from 2018 to 2023. This growth can be attributed to an uptick in demand for used smartphones that offer considerable savings compared with new models.

Objective

Build a linear regression model to predict the price of a used phone and identify factors that significantly influence it.

Key Questions

Data Dictionary

Data Description

The data contains the different attributes of used/refurbished phones. The detailed data dictionary is given below.

Data Dictionary

  1. brand_name: Name of manufacturing brand
  2. os: OS on which the phone runs
  3. screen_size: Size of the screen in cm
  4. 4g: Whether 4G is available or not
  5. 5g: Whether 5G is available or not
  6. main_camera_mp: Resolution of the rear camera in megapixels
  7. selfie_camera_mp: Resolution of the front camera in megapixels
  8. int_memory: Amount of internal memory (ROM) in GB
  9. ram: Amount of RAM in GB
  10. battery: Energy capacity of the phone battery in mAh
  11. weight: Weight of the phone in grams
  12. release_year: Year when the phone model was released
  13. days_used: Number of days the used/refurbished phone has been used
  14. new_price: Price of a new phone of the same model in euros
  15. used_price: Price of the used/refurbished phone in euros

Import Basic Libraries

1 - Exploratory Data Analysis - Explore & Extract insights from the data.

NOTE: I AM DOING AN INITAL EDA BEFORE ANY DATA TRANSFORMATIONS TO ESTABLISH SOME BASIC RELATIONSHIPS BETWEEN THE INDEPENDENT AND DEPENDENT VARIABLES

Insight: In this scenario 'Used Price' = the dependent variable - Some positive influence on "used price" based on slope of the line (weak) - 'screen_size' 'main_camera_mp','selfie_camera_mp', 'int_memory', 'ram', 'battery', 'weight','release_year', 'new_price', - Some negative influence on "used price" based on slope fo the line - 'days_used" The newer the phone the higher the used prices is and you'd expect the longer a phone is used the less is will command in resale - there is a general downward relationship as youd expect.

NOTE THE PANDAS PROFILING REPORT DOES NOT RENDER IN THE HTML VERSION OF THIS SUBMISSION

Insight: BEFORE data preprocessing this is what we end up with: - Frequent Brand - Samsung 364 , Huawei 264,LG 212, Lenovo172, ZTE 141, Xiaomi 134, Oppo 129, Asus 126, Alcatel 125 - Frequent OS - Android 3246 - Majority of phones are < 2017 and have 4g, but no 5g Basic Info - Number of variables 15 - Number of observations 3571 - Missing cells 215 - Missing cells (%) 0.4% - Duplicate rows 0 - Duplicate rows (%) 0.0% - Total size in memory 418.6 KiB - Average record size in memory 120 B - Categorical 2 - Numeric 11 - Boolean 2 Variable Data - there are 34 unique brands - there are 4 different OS versions - years range from 2013 to 2020 - avg screen size is ~15cm - the avg main camera is ~9MP - the avg selfie camera is ~6.5MP - the avg ram is ~4 - avg batterty capacity is ~3067 in mAh - the avg phone weight is ~179 grams - the new price range is ~9 to ~2560 (?) ; avg new price is ~237 compared to a used price if ~109 - the used price range is ~2.51 to ~1916; avg used price is ~109 - most of the used phones are 4g, not 5g

Insight:

  • screen_size is highly correlated with battery and 1 other fields
  • selfie_camera_mp is highly correlated with release_year and 1 other fields
  • ram is highly correlated with used_price
  • battery is highly correlated with screen_size and 1 other fields
  • weight is highly correlated with screen_size and 1 other fields
  • release_year is highly correlated with selfie_camera_mp and 1 other fields
  • days_used is highly correlated with selfie_camera_mp and 1 other fields
  • new_price is highly correlated with used_price
  • used_price is highly correlated with ram and 1 other fields
  • screen_size is highly correlated with selfie_camera_mp and 4 other fields
  • main_camera_mp is highly correlated with selfie_camera_mp and 2 other fields
  • selfie_camera_mp is highly correlated with screen_size and 7 other fields
  • int_memory is highly correlated with selfie_camera_mp and 1 other fields
  • battery is highly correlated with screen_size and 4 other fields
  • weight is highly correlated with screen_size and 1 other fields
  • release_year is highly correlated with screen_size and 5 other fields
  • days_used is highly correlated with selfie_camera_mp and 2 other fields
  • new_price is highly correlated with main_camera_mp and 2 other fields
  • used_price is highly correlated with screen_size and 6 other fields
  • screen_size is highly correlated with battery and 1 other fields
  • main_camera_mp is highly correlated with selfie_camera_mp
  • selfie_camera_mp is highly correlated with main_camera_mp and 1 other fields
  • battery is highly correlated with screen_size and 1 other fields
  • weight is highly correlated with screen_size and 1 other fields
  • release_year is highly correlated with selfie_camera_mp and 1 other fields
  • days_used is highly correlated with release_year
  • new_price is highly correlated with used_price
  • used_price is highly correlated with new_price
  • weight is highly correlated with screen_size and 1 other fields
  • new_price is highly correlated with used_price
  • used_price is highly correlated with new_price and 2 other fields
  • selfie_camera_mp is highly correlated with release_year and 3 other fields
  • main_camera_mp is highly correlated with brand_name and 1 other fields
  • release_year is highly correlated with selfie_camera_mp and 6 other fields
  • brand_name is highly correlated with selfie_camera_mp and 8 other fields
  • os is highly correlated with brand_name and 2 other fields
  • screen_size is highly correlated with weight and 4 other fields
  • days_used is highly correlated with release_year and 4 other fields
  • 5g is highly correlated with used_price and 3 other fields
  • int_memory is highly correlated with brand_name and 3 other fields
  • 4g is highly correlated with selfie_camera_mp and 4 other fields
  • battery is highly correlated with weight and 7 other fields
  • ram is highly correlated with used_price and 5 other fields
  • 4g is highly correlated with brand_name
  • os is highly correlated with brand_name
  • brand_name is highly correlated with 4g and 1 other fields
  • main_camera_mp has 180 (5.0%) missing values